Búsqueda | Portal Regional de la BVS

Break-induced replication underlies formation of inverted triplications and generates unexpected diversity in haplotype structures.

Grochowski, Christopher M; Bengtsson, Jesse D; Du, Haowei; Gandhi, Mira; Lun, Ming Yin; Mehaffey, Michele G; Park, KyungHee; Höps, Wolfram; Benito-Garagorri, Eva; Hasenfeld, Patrick; Korbel, Jan O; Mahmoud, Medhat; Paulin, Luis F; Jhangiani, Shalini N; Muzny, Donna M; Fatih, Jawid M; Gibbs, Richard A; Pendleton, Matthew; Harrington, Eoghan; Juul, Sissel; Lindstrand, Anna; Sedlazeck, Fritz J; Pehlivan, Davut; Lupski, James R; Carvalho, Claudia M B.

bioRxiv ; 2023 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-37873367

RESUMEN

Background: The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR) hypothesized to result from replicative repair of DNA due to replication fork collapse. It is often mediated by a pair of inverted low-copy repeats (LCR) followed by iterative template switches resulting in at least two breakpoint junctions in cis . Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved and is predicted to display at least four structural variation (SV) haplotypes. Results: Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the genomic DNA of 24 patients with neurodevelopmental disorders identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted SV haplotypes. Using a combination of short-read genome sequencing (GS), long- read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. This approach refined the point of template switching between inverted LCRs in 4 samples revealing a DNA segment of â¼2.2-5.5 kb of 100% nucleotide similarity. A prediction model was developed to infer the LCR used to mediate the non-allelic homology repair. Conclusions: These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate in replication-based repair mechanisms. Such inverted repeats are particularly relevant for formation of copy-number associated inversions, including the DUP-TRP/INV-DUP structures. Moreover, this type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci .

Assembly of 43 human Y chromosomes reveals extensive complexity and variation.

Hallast, Pille; Ebert, Peter; Loftus, Mark; Yilmaz, Feyza; Audano, Peter A; Logsdon, Glennis A; Bonder, Marc Jan; Zhou, Weichen; Höps, Wolfram; Kim, Kwondo; Li, Chong; Hoyt, Savannah J; Dishuck, Philip C; Porubsky, David; Tsetsos, Fotios; Kwon, Jee Young; Zhu, Qihui; Munson, Katherine M; Hasenfeld, Patrick; Harvey, William T; Lewis, Alexandra P; Kordosky, Jennifer; Hoekzema, Kendra; O'Neill, Rachel J; Korbel, Jan O; Tyler-Smith, Chris; Eichler, Evan E; Shi, Xinghua; Beck, Christine R; Marschall, Tobias; Konkel, Miriam K; Lee, Charles.

Nature ; 621(7978): 355-364, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37612510

RESUMEN

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.

Asunto(s)

Cromosomas Humanos Y , Evolución Molecular , Humanos , Masculino , Cromosomas Humanos Y/genética , Genoma Humano/genética , Genómica , Tasa de Mutación , Fenotipo , Eucromatina/genética , Seudogenes , Variación Genética/genética , Cromosomas Humanos X/genética , Regiones Pseudoautosómicas/genética

Gaps and complex structurally variant loci in phased genome assemblies.

Porubsky, David; Vollger, Mitchell R; Harvey, William T; Rozanski, Allison N; Ebert, Peter; Hickey, Glenn; Hasenfeld, Patrick; Sanders, Ashley D; Stober, Catherine; Korbel, Jan O; Paten, Benedict; Marschall, Tobias; Eichler, Evan E.

Genome Res ; 33(4): 496-510, 2023 04.

Artículo en Inglés | MEDLINE | ID: mdl-37164484

RESUMEN

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.

Asunto(s)

ADN Satélite , Polimorfismo Genético , Humanos , ADN Satélite/genética , Haplotipos , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN

Inversion polymorphism in a complete human genome assembly.

Porubsky, David; Harvey, William T; Rozanski, Allison N; Ebler, Jana; Höps, Wolfram; Ashraf, Hufsah; Hasenfeld, Patrick; Paten, Benedict; Sanders, Ashley D; Marschall, Tobias; Korbel, Jan O; Eichler, Evan E.

Genome Biol ; 24(1): 100, 2023 04 30.

Artículo en Inglés | MEDLINE | ID: mdl-37122002

RESUMEN

The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21.

Asunto(s)

Genoma Humano , Polimorfismo Genético , Humanos , Variación Estructural del Genoma , Inversión Cromosómica

Functional analysis of structural variants in single cells using Strand-seq.

Jeong, Hyobin; Grimes, Karen; Rauwolf, Kerstin K; Bruch, Peter-Martin; Rausch, Tobias; Hasenfeld, Patrick; Benito, Eva; Roider, Tobias; Sabarinathan, Radhakrishnan; Porubsky, David; Herbst, Sophie A; Erarslan-Uysal, Büsra; Jann, Johann-Christoph; Marschall, Tobias; Nowak, Daniel; Bourquin, Jean-Pierre; Kulozik, Andreas E; Dietrich, Sascha; Bornhauser, Beat; Sanders, Ashley D; Korbel, Jan O.

Nat Biotechnol ; 41(6): 832-844, 2023 06.

Artículo en Inglés | MEDLINE | ID: mdl-36424487

RESUMEN

Somatic structural variants (SVs) are widespread in cancer, but their impact on disease evolution is understudied due to a lack of methods to directly characterize their functional consequences. We present a computational method, scNOVA, which uses Strand-seq to perform haplotype-aware integration of SV discovery and molecular phenotyping in single cells by using nucleosome occupancy to infer gene expression as a readout. Application to leukemias and cell lines identifies local effects of copy-balanced rearrangements on gene deregulation, and consequences of SVs on aberrant signaling pathways in subclones. We discovered distinct SV subclones with dysregulated Wnt signaling in a chronic lymphocytic leukemia patient. We further uncovered the consequences of subclonal chromothripsis in T cell acute lymphoblastic leukemia, which revealed c-Myb activation, enrichment of a primitive cell state and informed successful targeting of the subclone in cell culture, using a Notch inhibitor. By directly linking SVs to their functional effects, scNOVA enables systematic single-cell multiomic studies of structural variation in heterogeneous cell populations.

Asunto(s)

Cromotripsis , Leucemia , Neoplasias , Humanos , Neoplasias/genética , Leucemia/genética , Reordenamiento Génico , Línea Celular , Variación Estructural del Genoma

Semi-automated assembly of high-quality diploid human reference genomes.

Jarvis, Erich D; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A; Carnevali, Paolo; Chaisson, Mark J P; Chin, Chen-Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly; Fedrigo, Olivier; Fulton, Robert S; Fulton, Lucinda L; Garg, Shilpa; Gerton, Jennifer L; Ghurye, Jay; Granat, Anastasiya; Green, Richard E; Harvey, William; Hasenfeld, Patrick; Hastie, Alex; Haukness, Marina; Jaeger, Erich B; Jain, Miten; Kirsche, Melanie; Kolmogorov, Mikhail; Korbel, Jan O; Koren, Sergey; Korlach, Jonas; Lee, Joyce; Li, Daofeng; Lindsay, Tina; Lucas, Julian; Luo, Feng; Marschall, Tobias; Mitchell, Matthew W; McDaniel, Jennifer; Nie, Fan; Olsen, Hugh E; Olson, Nathan D.

Nature ; 611(7936): 519-531, 2022 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-36261518

RESUMEN

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

Asunto(s)

Mapeo Cromosómico , Diploidia , Genoma Humano , Genómica , Humanos , Mapeo Cromosómico/normas , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas , Estándares de Referencia , Genómica/métodos , Genómica/normas , Cromosomas Humanos/genética , Variación Genética/genética

A high-resolution map of small-scale inversions in the gibbon genome.

Mercuri, Ludovica; Palmisano, Donato; L'Abbate, Alberto; D'Addabbo, Pietro; Montinaro, Francesco; Catacchio, Claudia Rita; Hasenfeld, Patrick; Ventura, Mario; Korbel, Jan O; Sanders, Ashley D; Maggiolini, Flavia Angela Maria; Antonacci, Francesca.

Genome Res ; 32(10): 1941-1951, 2022 10.

Artículo en Inglés | MEDLINE | ID: mdl-36180231

RESUMEN

Gibbons are the most speciose family of living apes, characterized by a diverse chromosome number and rapid rate of large-scale rearrangements. Here we performed single-cell template strand sequencing (Strand-seq), molecular cytogenetics, and deep in silico analysis of a southern white-cheeked gibbon genome, providing the first comprehensive map of 238 previously hidden small-scale inversions. We determined that more than half are gibbon specific, at least fivefold higher than shown for other primate lineage-specific inversions, with a significantly high number of small heterozygous inversions, suggesting that accelerated evolution of inversions may have played a role in the high sympatric diversity of gibbons. Although the precise mechanisms underlying these inversions are not yet understood, it is clear that segmental duplication-mediated NAHR only accounts for a small fraction of events. Several genomic features, including gene density and repeat (e.g., LINE-1) content, might render these regions more break-prone and susceptible to inversion formation. In the attempt to characterize interspecific variation between southern and northern white-cheeked gibbons, we identify several large assembly errors in the current GGSC Nleu3.0/nomLeu3 reference genome comprising more than 49 megabases of DNA. Finally, we provide a list of 182 candidate genes potentially involved in gibbon diversification and speciation.

Asunto(s)

Hominidae , Hylobates , Animales , Hylobates/genética , Genoma , Primates/genética , Inversión Cromosómica/genética , Cromosomas , Hominidae/genética

Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders.

Porubsky, David; Höps, Wolfram; Ashraf, Hufsah; Hsieh, PingHsun; Rodriguez-Martin, Bernardo; Yilmaz, Feyza; Ebler, Jana; Hallast, Pille; Maria Maggiolini, Flavia Angela; Harvey, William T; Henning, Barbara; Audano, Peter A; Gordon, David S; Ebert, Peter; Hasenfeld, Patrick; Benito, Eva; Zhu, Qihui; Lee, Charles; Antonacci, Francesca; Steinrücken, Matthias; Beck, Christine R; Sanders, Ashley D; Marschall, Tobias; Eichler, Evan E; Korbel, Jan O.

Cell ; 185(11): 1986-2005.e26, 2022 05 26.

Artículo en Inglés | MEDLINE | ID: mdl-35525246

RESUMEN

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.

Asunto(s)

Inversión Cromosómica , Duplicaciones Segmentarias en el Genoma , Inversión Cromosómica/genética , Variaciones en el Número de Copia de ADN/genética , Genoma Humano , Genómica , Humanos

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Ebert, Peter; Audano, Peter A; Zhu, Qihui; Rodriguez-Martin, Bernardo; Porubsky, David; Bonder, Marc Jan; Sulovari, Arvis; Ebler, Jana; Zhou, Weichen; Serra Mari, Rebecca; Yilmaz, Feyza; Zhao, Xuefang; Hsieh, PingHsun; Lee, Joyce; Kumar, Sushant; Lin, Jiadong; Rausch, Tobias; Chen, Yu; Ren, Jingwen; Santamarina, Martin; Höps, Wolfram; Ashraf, Hufsah; Chuang, Nelson T; Yang, Xiaofei; Munson, Katherine M; Lewis, Alexandra P; Fairley, Susan; Tallon, Luke J; Clarke, Wayne E; Basile, Anna O; Byrska-Bishop, Marta; Corvelo, André; Evani, Uday S; Lu, Tsung-Yu; Chaisson, Mark J P; Chen, Junjie; Li, Chong; Brand, Harrison; Wenger, Aaron M; Ghareghani, Maryam; Harvey, William T; Raeder, Benjamin; Hasenfeld, Patrick; Regier, Allison A; Abel, Haley J; Hall, Ira M; Flicek, Paul; Stegle, Oliver; Gerstein, Mark B; Tubio, Jose M C.

Science ; 372(6537)2021 04 02.

Artículo en Inglés | MEDLINE | ID: mdl-33632895

RESUMEN

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

Asunto(s)

Variación Genética , Genoma Humano , Haplotipos , Femenino , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Secuencias Repetitivas Esparcidas , Masculino , Grupos de Población/genética , Sitios de Carácter Cuantitativo , Retroelementos , Análisis de Secuencia de ADN , Inversión de Secuencia , Secuenciación Completa del Genoma

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA